Many data clustering problems can be interpreted as clustering of vertices of graphs. Graph bipartitioning problem is to partition vertices into subsets such that the connections within subsets are stronger than the connections between different subsets.
Partition of the vertices into two subsets is done according to signs of the eigenvector of the second smallest eigenvalue of the Laplacian matrix.
The reader should be familiar with the basic graph theory, linear algebra and, in particular, eigenvalues and eigenvectors.
The reader should be able to apply graph spectral bipartitioning and recursive bipartitioning to data clustering problems.
Credits. The notebook is based on I. Mirošević, Spectral Graph Partitioning and Application to Knowledge Extraction.
For more details, see W. H. Haemers, Matrices and Graphs and S. Butler and F. Chung, Spectral Graph Theory and the references therein.
A weighted graph is an ordered triplet $G=(V,E,\omega)$, where $V=\{1,2,3,...,n\}$ is the set of vertices , $E=\{(i,j)\}$ is a set of edges connecting vertices, and $\omega$ is a set of weights of edges. We assume $G$ is undirected.
An adjacency matrix of graph $G$ is the matrix $A$ defined as $A_{ij}=\begin{cases} 1 \quad \textrm{if}\ (i,j)\in E, \\ 0\quad \textrm{otherwise} \end{cases}.$
A weight matrix of graph $G$ is the matrix $W$ defined as $W_{ij}=\begin{cases} \omega(e) \quad \textrm{if}\ e=(i,j)\in E, \\ 0\quad \textrm{otherwise} \end{cases}.$
A Laplacian matrix of graph $G$ is the matrix $L=D-W$, where $D=\mathop{\mathrm{diag}}(d_1,d_2,\ldots,d_n)$ with $d_i=\sum_{k=1}^n W_{ik}$ for $i=1,\ldots,n$.
A normalized Laplacian matrix is the matrix $L_n=D^{-1/2} L D^{-1/2}\equiv D^{-1/2} (D-W) D^{-1/2}$ (the scaled matrix of $L$).
An incidence matrix of graph $G$ is the $|V|\times |E|$ matrix $I_G$. Each row of $I_G$ corresponds to a vertex of $G$ and each column corresponds to an edge of $G$. In the column corresponding to en edge $e=(i,j)$, all elements are zero except the ones in the $i$-th and $j$-th row, which are equal to $\sqrt{\omega(e)}$ and $-\sqrt{\omega(e)}$, respectively.
Graph types and algorithms are implemented in the package LightGraphs.jl.
Plotting graphs is done by the packages GraphPlot.jl.
As a minor inconvenience, we can only plot unweighted graphs and plot weights as edge labels.
In [1]:
using LightGraphs
using GraphPlot
In [2]:
# varinfo(LightGraphs)
In [3]:
# Sources, targets and weights
n=7
sn=[1,1,1,2,2,3,3,3,5,5,6]
tn=[2,3,4,4,5,4,6,7,6,7,7]
wn=[2,3,4,7,1,3,2,1,7,3,5]
[sn tn wn]
Out[3]:
In [4]:
# Create the graph
G=Graph(n)
for i=1:length(sn)
add_edge!(G,sn[i],tn[i])
end
G
Out[4]:
In [5]:
# What is the optimal bipartition?
gplot(G, nodelabel=1:n, edgelabel=wn)
Out[5]:
In [6]:
# Now some functions
using SparseArrays
using LinearAlgebra
function my_weight_matrix(src::Array,dst::Array,weights::Array)
n=nv(G)
sparse([src;dst],[dst;src],[weights;weights],n,n)
end
my_laplacian(W::AbstractMatrix)=spdiagm(0=>vec(sum(W,dims=2)))-W
function my_normalized_laplacian(L::AbstractMatrix)
D=1.0./sqrt.(diag(L))
n=length(D)
[L[i,j]*(D[i]*D[j]) for i=1:n, j=1:n]
end
Out[6]:
In [7]:
W=my_weight_matrix(sn,tn,wn)
Out[7]:
In [8]:
Matrix(W)
Out[8]:
In [9]:
L=my_laplacian(W)
Matrix(L)
Out[9]:
In [10]:
Ln=my_normalized_laplacian(L)
Out[10]:
In [11]:
issymmetric(Ln)
Out[11]:
In [12]:
# Let us compute the incidence matrix
function my_incidence_matrix(G::Graph, weights::Array)
A=zeros(nv(G),ne(G))
k=1
for a in edges(G)
A[a.dst,k]=sqrt.(weights[k])
A[a.src,k]=-sqrt(weights[k])
k+=1
end
A
end
Out[12]:
In [13]:
Iᵧ=my_incidence_matrix(G,wn)
Out[13]:
$L=I_{G}I_{G}^{T}$.
$L$ is symmetric PSD matrix.
$L\mathbf{1}=0$ for $\mathbf{1}=[1,...,1]^{T}$, thus $0$ is an eigenvalue of $L$ and $\mathbf{1}$ is the corresponding eigenvector.
If $G$ has $c$ connected components, then $L$ has $c$ eigenvalues equal to $0$.
For every $x\in \mathbb{R}^{n}$, it holds $x^{T}L x=\sum\limits_{i<j}W_{ij}(x_{i}-x_{j})^{2}$.
For every $x\in\mathbb{R}^{n}$ and $\alpha,\beta\in\mathbb{R}$, it holds $(\alpha x+\beta \mathbf{1})^{T} L (\alpha x+\beta \mathbf{1}) =\alpha^{2} x^{T}L x$.
Assume that the eigenvalues of $L$ are increasingly ordered. Then, $$ 0=\lambda_1(L)\leq \lambda_2(L)\leq \cdots \leq\lambda_{n}(L)\leq 2\max\limits_{i=1,\cdots ,n}d_{i}. $$
$\sigma(L_n) \subseteq [0,2]$.
In [14]:
# Fact 1
norm(L-Iᵧ*Iᵧ')
Out[14]:
In [15]:
# Facts 2 and 7
using Arpack
issymmetric(L), eigs(L)[1], 2*maximum(diag(L))
Out[15]:
In [16]:
# Fact 3
L*ones(n)
Out[16]:
In [17]:
# Fact 5
x=rand(n)
x'*L*x, sum([W[i,j]*(x[i]-x[j])^2 for i=1:n, j=1:n])/2
Out[17]:
In [18]:
# Fact 6
α,β=rand(),rand()
(α*x+β*ones(n))'*L*(α*x+β*ones(n)), α^2*x'*L*x
Out[18]:
In [19]:
# Fact 8
eigvals(Ln)
Out[19]:
Let $\pi=\{V_{1},V_{2}\}$ be a partition of $V$ with $V_1,V_2\neq \emptyset$, $V_1\cup V_2=V$, and $V_1\cap V_2=\emptyset$.
A cut of a partition $\pi$ is the sum of weights of all edges between $V_1$ and $V_2$,
$$\mathop{\mathrm{cut}}(\pi)\equiv \mathop{\mathrm{cut}}(V_1,V_2)=\sum\limits_{{\displaystyle i\in V_{1} \atop \displaystyle j\in V_{2}}}W_{ij}.$$A weight of a vertex $i\in V$ is the sum of the weights of all edges emanating from $i$, $\omega(i)=\sum\limits_{j=1}^{n}W_{ij}$.
A weight of a subset $\bar V\subset V$ is the sum of the weights of all vertices in $\bar V$, $\omega(\bar V)=\sum\limits_{\displaystyle i\in\bar V} \omega(i)$.
A proportional cut of a partition $\pi$ is
$$ \mathop{\mathrm{pcut}}(\pi)=\displaystyle\frac{\mathop{\mathrm{cut}}(\pi)}{|V_{1}|}+\frac{\mathop{\mathrm{cut}}(\pi)}{|V_{2}|}. $$A normalized cut of a partition $\pi$ is
$$ \mathop{\mathrm{ ncut}}(\pi)=\displaystyle\frac{\mathop{\mathrm{cut}}(\pi)}{\omega(V_{1})}+\frac{\mathop{\mathrm{cut}}(\pi)}{\omega(V_{2})}. $$Consider the following partitions (all edges have unit weights):
The left partition $\pi$ v.s. the right partition $\pi^{\prime}$:
$\mathop{\mathrm{cut}}(\pi)=2$ v.s. $\mathop{\mathrm{cut}}(\pi^{\prime})=3$
$\mathop{\mathrm{pcut}}(\pi)=\frac{2}{1}+\frac{2}{11}=2.18$ v.s. $\mathop{\mathrm{pcut}}(\pi^{\prime})=\frac{3}{6}+\frac{3}{6}=1$
$\mathop{\mathrm{ncut}}(\pi)=\frac{2}{2}+\frac{2}{50}=1.04$ v.s. $\mathop{\mathrm{ncut}}(\pi^{\prime})=\frac{3}{27}+\frac{3}{25}=0.23$
The informal description of the bipartitioning problem can be formulated as two problems, $$ \mathop{\textrm{arg min}}\limits_{\pi} \mathop{\mathrm{pcut}}(\pi) \quad \textrm{or} \quad \mathop{\textrm{arg min}}\limits_{\pi} \mathop{\mathrm{ncut}}(\pi). $$ The first problem favors partitions into subsets with similar numbers of vertices, while the second problem favors partitions into subsets with similar weights.
Both problems are NP-hard.
Approximate solutions can be computed by suitable relaxations in $O(n^2)$ operations.
The partition $\pi$ is defined by the vector $y$ such that $$ y_{i}= \begin{cases} \frac{1}{2} & \text{for } i\in V_1 \\ -\frac{1}{2} & \text{for } i\in V_2 \end{cases} $$ The proportional cut problem can be formulated as the discrete proportional cut problem $$ \underset{\displaystyle \big|\mathbf{y}^{T}\mathbf{1} \big|\leq \beta} {\min\limits_{\displaystyle y_{i}\in \{-1/2,1/2\}}} \frac{1}{2}\sum_{i,j}(y_{i}-y_{j})^{2}W_{ij}. $$ Parameter $\beta$ controls the number of vertices in each subset.
The normalized cut problem can be formulated as the discrete normalized cut problem $$ \underset{\displaystyle \big|y^{T}D\mathbf{1} \big|\leq \beta} {\min\limits_{\displaystyle y_{i}\in \{-1/2,1/2\}}} \frac{1}{2}\sum_{i,j}(y_{i}-y_{j})^{2}W_{ij}. $$ Parameter $\beta$ controls the weights of each subset.
Using the Fact 5 above, the discrete proportional cut problem can be formulated as the relaxed proportional cut problem $$ \underset{\displaystyle y^{T}y=1}{\underset{\displaystyle \big| y^{T}\mathbf{1} \big| \leq 2\frac{\beta}{\sqrt{n}}} {\min\limits_{\displaystyle y\in \mathbb{R}^{n}}}} y^{T}L y. $$ Similarly, the discrete normalized cut problem can be formulated as the relaxed normalized cut problem $$ \underset{\displaystyle y^{T}Dy=1}{\underset{\displaystyle \big| y^{T}D\mathbf{1}\big| \leq \displaystyle \frac{\beta}{\sqrt{\theta n}}}{\min\limits_{\displaystyle y\in \mathbb{R}^{n}}}}y^{T}L_n y. $$
The Main Theorem. Let $A\in \mathbb{R}^{n\times n}$ be a symmetric matrix with eigenvalues $\lambda _{1}<\lambda _{2}<\lambda_{3}\leq \cdots \leq \lambda _{n}$ and let $v^{[1]},v^{[2]},\ldots,v^{[n]}$ be the corresponding eigenvectors. For the fixed $0\leq \alpha <1$, the solution of the problem $$ \underset{\displaystyle y^{T}y=1}{\underset{\displaystyle \left|y^{T}v^{[1]}\right|\leq \alpha} {\min\limits_{\displaystyle y\in \mathbb{R}^{n}}}} y^{T}Ay $$ is $y=\pm \alpha v^{[1]}\pm \sqrt{1-\alpha^{2}}v^{[2]}$. (For the proof see D. J. Higham and M. Kibble, A Unified View of Spectral Clustering.)
For $0\leq \beta <\frac{n}{2}$, the solution of the relaxed proportional cut problem is $$ y=\pm \frac{2\beta}{n\sqrt{n}}\mathbf{1}\pm \sqrt{1-4\frac{\beta ^{2}}{n^{2}}}v^{[2]}, $$ where $v^{[2]}$ is an eigenvector corresponding to $\lambda_2(L)$. $v^{[2]}$ the Fiedler vector. Since the first summand carries no information, $V$ is partitioned according to the signs of the components of $v^{[2]}$: $$ V_{1}=\{i:v^{[2]}_i <0\}, \quad V_{2}=\{i:v^{[2]}_i \geq 0\}. $$ Notice that the value of $\beta$ is irrelevant for the solution.
For $0\leq \beta <\sqrt{\theta n}\left\Vert D^{\frac{1}{2}}\mathbf{1} \right\Vert _{2}$, the solution of the relaxed normalized cut problem is $$ y=\pm \frac{\beta }{\sqrt{\theta n}\left\Vert D^{\frac{1}{2}} \mathbf{1}\right\Vert _{2}^{2}}\mathbf{1}\pm \sqrt{1-\frac{\beta ^{2}}{ \theta n\left\Vert D^{\frac{1}{2}}\mathbf{1}\right\Vert _{2}^{2}}}D^{-\frac{1 }{2}} v^{[2]}, $$ where $v^{[2]}$ is an eigenvector corresponding to $\lambda_2(L_n)$. $V$ is partitioned according to the signs of the components of $v^{[2]}$, as above.
Neither of the relaxed algorithms is guaranteed to solve exactly the true (proportional / normalized) cut problem. However, the computed solutions are in the right direction. Whether to use proportional or normalized cut formulation, depends upon the specific problem.
In [20]:
Matrix(L)
Out[20]:
In [21]:
# Voila!
λ,v=eigs(L,nev=2,which=:SM, v0=ones(n))
Out[21]:
In [22]:
v
Out[22]:
In [23]:
λ,v=eigs(Ln,nev=2,which=:SM, v0=ones(n))
Out[23]:
In [24]:
v
Out[24]:
A complete graph has edges connecting each pair of vertices.
To a set of points $X=\{x_{1},x_{2},\cdots ,x_{m}\}$ , where $x_{i}\in\mathbb{R}^{n}$, we assign a weighted complete graph $G=(V,E)$ with $m$ vertices, where the vertex $j\in V$ corresponds to the point $x_j\in X$.
The main idea is to assign weight of an edge $e=(i,j)$ which reflects the distance between $x_i$ and $x_j$, something like $\omega(e)=\displaystyle\frac{1}{\mathop{\mathrm{dist}}(x_i,x_j)}$.
However, this has to be implemented with care. For example, using simple Euclidean distance yield the same results as the function kmeans()
. In this example we use Gaussian kernel, that is
$$
\omega(e)=e^{\displaystyle -\|x_i-x_j\|_2^2/\sigma^2},
$$
where the choice of $\sigma$ is based on experience.
The computation of various distances is implemented in the package Distances.jl.
We will construct the Laplace matrix directly.
In [25]:
using Plots
using Distances
In [29]:
# Generate two concentric rings
k=2
import Random
Random.seed!(421)
# Center
center=[rand(-5:5);rand(-5:5)]
# Radii
radii=Random.randperm(10)[1:k]
# Number of points in circles
sizes=rand(1000:2000,k)
# radii=[1;10]
center,radii,sizes
Out[29]:
In [30]:
# Points
m=sum(sizes)
X=Array{Float64}(undef,2,m)
first=0
last=0
for j=1:k
first=last+1
last=last+sizes[j]
# Random angles
ϕ=2*π*rand(sizes[j])
for i=first:last
l=i-first+1
X[:,i]=center+radii[j]*[cos(ϕ[l]);sin(ϕ[l])]+(rand(2).-0.5)/50
end
end
scatter(X[1,:],X[2,:],aspect_ratio=:equal)
Out[30]:
In [31]:
# Weight matrix
W=1 ./pairwise(SqEuclidean(),X,dims=2)
Out[31]:
In [32]:
# Laplacian matrix
for i=1:m
W[i,i]=0
end
L=Diagonal(vec(sum(W,dims=2)))-W
# Check Fact 3
norm(L*ones(m))
Out[32]:
In [33]:
# Notice λ₁
λ,v=eigs(L,nev=2,which=:SM, v0=ones(m))
Out[33]:
In [34]:
# Define clusters
C=ones(Int64,m)
C[findall(v[:,2].>0)].=2
Out[34]:
In [35]:
# Yet another plotting function
function plotKpartresult(C::Vector,X::Array)
k=maximum(C)
# Clusters
scatter(X[1,findall(C.==1)],X[2,findall(C.==1)])
for j=2:k
scatter!(X[1,findall(C.==j)],X[2,findall(C.==j)])
end
scatter!(aspect_ratio=:equal)
end
Out[35]:
In [36]:
plotKpartresult(C,X)
Out[36]:
This is the same partitioning as obtained by kmeans()
. Let us try Gaussian kernel. A rule of thumb is: if rings are close, use $\sigma<1$, if rings are apart, use $\sigma>1$.
In [37]:
σ=1.2 # 0.1
W=exp.(-pairwise(SqEuclidean(),X,dims=2)/σ^2)-I
L=diagm(0=>vec(sum(W,dims=2)))-W
λ,v=eigs(L,nev=2,which=:SM, v0=ones(m))
C=ones(Int64,m)
C[findall(v[:,2].>0)].=2
plotKpartresult(C,X)
Out[37]:
Let $G=(V,E)$ be a weighted graph with weights $\omega$.
Let $\pi_k =\{V_{1},V_{2},...,V_{k}\}$ be a $k$-partition of $V$, with $V_i\neq \emptyset$ for $i=1,\ldots,k$.
The previous definition of $\mathop{\mathrm{cut}}(\pi)\equiv \mathop{\mathrm{cut}}(\pi_2)$ extends naturally to $k$-partition. A cut of a partition $\pi_k$ is
$$ \mathop{\mathrm{cut}}(\pi_k)=\sum\limits_{\displaystyle i<j} \mathop{\mathrm{cut}}(V_{i},V_{j}), $$where $\mathop{\mathrm{cut}}(V_{i},V_{j})$ is interpreted as a cut of the bipartition of the subgraph of $G$ with vertices $V_1\cup V_2$.
A proportional cut of a partition $\pi_k$ is
$$ \mathop{\mathrm{pcut}}(\pi_k)=\underset{i<j}{\sum\limits_{i,j=1}^{k}} \left( \frac{\mathop{\mathrm{cut}}(V_{i},V_{j})}{|V_{i}|}+\frac{\mathop{\mathrm{cut}}(V_{i},V_{j})}{|V_{j}|}\right) = \sum_{i=1}^{k}\frac{\mathop{\mathrm{cut}}(V_{i},V\backslash V_{i})}{|V_{i}|}. $$A normalized cut of a partition $\pi_k$ is
$$ \mathop{\mathrm{ncut}}(\pi_k)=\underset{i<j}{\sum\limits_{i,j=1}^{k}} \left( \frac{\mathop{\mathrm{cut}}(V_{i},V_{j})}{\omega(V_{i})}+\frac{\mathop{\mathrm{cut}}(V_{i},V_{j})}{\omega(V_{j})}\right) = \sum_{i=1}^{k}\frac{\mathop{\mathrm{cut}}(V_{i},V\backslash V_{i})}{ \omega(V_{i})}. $$If we want to cluster vertices of graph $G=(V,E)$ into $k$ clusters, we can apply the following recursive algorithm:
Initialization. Compute the bipartition $\pi=\{V_{1},V_{2}\}$ of $V$. Set the counter $c=2$.
Recursion. While $c<k$ repeat:
Compute the bipartition of each subset of $V$.
Among all $(c+1)$-partitions, choose the one with the smallest $\mathop{\mathrm{pcut}}(\pi_{c+1})$ or $\mathop{\mathrm{ncut}}(\pi_{c+1})$, respectively.
Set $c=c+1$.
Stop.
There is no guarantee for optimality of this algorithm. Clearly, the optimal $k$-partiton may be a subpartition of one of the discarded partitions.